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TITLE 

NOVEL HUMAN (Xl CHAIN COLLAGEN 



BACKGROUND OF THE INVENTION 
Field of the Invention 

The present invention relates to a novel human collagen 
protein and a polynucleotide sequence, which encodes the 
novel human collagen protein. More particularly, it relates 
to polynucleotides encoding human al chain collagen and 
derivatives thereof. 

Description of the Related Art 

Collagens are structure proteins that participate in 
the assembly of various kinds of polymers in the 
extracellular matrix. Collagen polypeptides contain one or 
more blocks of (Gly-x-y) repeat, in which x represents any 
amino acid residue, and y frequently represents prolyl or 
hydroxyprolyl residues. The presence of such sequence 
repeats allows groups of three collagen polypeptides to fold 
into triple-helical domains, which are rigid and 
inextensible . 

So far, 20 distinct types of homo- and heterotrimeric 
molecules, encoded by more than 3 0 genes, have been 
identified in vertebrates. These proteins exhibit 

considerable diversity size, sequences, tissue distribution, 
molecular composition, and each plays a different structure 
role in connective tissue. 

Within the superfamily of collagens, two categories are 
classified. The fibrillar collagens include types I, II, 
III, V, and XI collagen. The triple-helical domains of the 
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proteins polymerize in a staggered fashion to form fibrils. 
Members of other collagens do not by themselves form cross - 
striated fibrils, but may be associated with fibrils (FACIT 
or fibril associated collagens with interrupted triple 
helices), including types IX, XII, XIV, XVI, and XIX 
collagen. The structure of these molecules comprises two or 
more relatively short triple-helical (COL) domains connected 
and flanked by non- triple-helical (NC) sequences. Type IX 
collagen is the best-characterized molecule in the members. 
Studies of transgenic mice with mutations in type IX 
collagen have been proposed that it acts as molecular 
bridges between cartilage collagen fibrils and other matrix 
components, perhaps proteoglycans. The COL domains and the 
central NC domains of this molecule interact with type II 
collagen through covalent cross-links to form fibrils. The 
amino- terminal NC domain has a potential of interacting with 
other extracellular components. Also, in vitro studies have 
demonstrated that the N-terminal non- triple-helical domains 
of type XII and XIV collagen promote contraction of collagen 
gels. However, the detailed interactions of the bridging 
hypothesis are not clear. 

Collagens are typical mosaic proteins containing a 
number of shuffled domains. These domains have been 
classified by sequence similarity in order to characterize 
their structural and functional relationships to other 
proteins. This analysis provides an overview of homologies 
of collagen domains. It also reveals two new relationships 
(i) a module common to type V, IX, XI, and XII collagens was 
found to be homologous to the heparin binding domain of 
thrombospondin; (ii) the modular architecture of a human 
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type VII collagen fragment was identified. Its N-terminal 
globular domain contains fibronectin type III repeats 
located adjacent to a von Willebrand factor type A module. 
The proposed structural similarities point to analogous 
5 subfunctions of the respective domains in otherwise distinct 
proteins . 

Thrombospondin is one of a class of adhesively 
homotrimeric glycoproteins that mediate cell-to-cell and 
cell-to-matrix interactions. It is expressed in 

10 extracellular matrix, and may have autocrine growth 
regulatory properties involved in platelet aggregation, 
€J embryogenesis , morphogenesis, cell adhesion molecule, major 
gi activator of TGF01 . The von Willebrand factor A (vWF) like 
domain is the prototype for a protein superfamily and it xs 
His found in various proteins including plasma complement 
factors, integrands, collagens, and other extracellular 
proteins. Proteins that incorporate vWF domains participate 
O in numerous biological events, such as cell adhesion, 
Tl migration, homing, pattern formation, and signal 
20 transduction. 

Collagens are important bio-medical building blocks 
with the functions of tissue growth, anaplasty, dressing for 
burn, and wound healing, etc., and the requirements thereof 
expand largely. Therefore, there is still a need to develop 
25 a novel collagen and the derivatives thereof having more 
therapeutic value and diversity for the various applications. 

The present inventors have successfully cloned a novel 
human al chain collagen gene by way of known human 
expression sequence tag (EST) in combination with bio- 
30 informatics and molecular cloning techniques. After 
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comparison of the inventive collagen with the existent 20 
collagens, the highest sequence homology is less than 30%, 
indicating the collagen of the invention is a novel form. 

Blood vessels are tubes of endothelial cells surrounded 
by layers of smooth muscle cells and connective tissue 
proteins . During development this complex structure forms as 
a result of biochemical signals between endothelial cells 
and smooth muscle cells. Sometimes this biochemical 
communication fails and abnormal blood vessels form. By 
analyzing gene mutations causing such vascular abnormalities, 
it can be learned about the signals necessary for normal 
blood vessel development. In addition, identification of 
genes responsible for inherited vascular malformations 
provides a basis for development of rational therapies in 
the clinical treatment of vascular disorders. 

SUMMARY OF THE INVENTION 

It is therefore the primary object of the present 
invention to provide an isolated nucleic acid (hCOLAl) and 
the degenerate sequences thereof, which encodes human al 
chain collagen protein, comprising the nucleotide sequence 
set forth in SEQ ID NO. 5. The present invention also 
provides the expression profile of the isolated collagen 
gene and the exact tissue and cellular localization of this 
collagen protein. Moreover, the present invention provides 
nucleotide fragments derived from SEQ ID NO . 5 as a nucleic 
acid probe or primer. 

In one preferred embodiment, the present invention 
provides a novel human al chain collagen protein encoded by 



Client's Ref.: BMEC-89-24/ 10-30-2001 
File: 0648-5921 -US/Final/ Frank/Chiumeow 

the nucleic acid mentioned above, which has the amino acid 
sequence set forth in SEQ ID NO. 1. 

Another aspect of the present invention provides a 
recombinant vector comprising the nucleic acid mentioned 
5 above and a regulatory sequence. 

Still another aspect of the present invention provides 
a method for producing human al chain collagen protein, 
comprising the steps of: (a) transforming or transfecting a 
host cell with the recombinant vector described above; (b) 
10 culturing said transformed or transfected cell under the 

Q conditions sufficient for expression of the human al chain 

41 

€$ collagen protein; and (c) recovering and purifying the human 

■«!»? 

fjl al chain collagen protein. 

P Yet still another aspect of the present invention 

H=15 provides a diagnostic kit for detecting the disease related 

L to the mutation of SEQ ID NO. 5 in a mammal or human, 

H comprising the nucleic acid probe or primer described above. 

yj 
£3 
O 

~, BRIEF DESCRIPTION OF THE DRAWINGS 

2 0 The present invention will be more fully understood and 

further advantages will become apparent when reference is 
made to the following description of the invention and the 
accompanying drawings in which: 

FIG. 1 is a diagram showing the PCR cloning for the 

25 human al chain collagen cDNA of the invention, wherein lane 
1 is the molecular weight markers; lane 2 is the negative 
control in which the human cDNAs are devoid; and lane 3 is 
the PCR product containing the cDNA coding for the human al 
chain collagen of the invention. 
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FIG. 2 is a diagram showing the construct of the 
recombinant vector Bluescript KS(+)/E. coli (hCOLAl) of the 
present invention. 

FIG. 3 is a diagram showing the complete nucleotide 
5 sequence (SEQ ID NO. 5) and the corresponding amino acid 
sequence (SEQ ID NO. 1) of the human al chain collagen of 
the invention. 

FIG. 4 is a diagram showing the hydropathy profile of 
the deduced amino acid sequence of SL. Kyte-Doolittle 
10 hydrophobicity profile of the human al chain collagen 
plotted with a 11 -residue window. 

FIG. 5 is a schematic diagram showing domain structure 



fil of the human al chain collagen protein of the invention. 
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FIG. 6 (A-C) is a diagram showing the amino acid 
^4.5 sequence comparison of the human al chain collagen protein 
of the invention with type IX (Col9al) and type XIX (Coll9al) 
collagens wherein the black box refers to the identical 
P amino acid residue, and symbol "-" refers to gap insertion 
'.. for the optimal alignment. The lower panel of FIG. 6 also 
20 shows the evolution tree among these collagens. 

FIG. 7(A) is a Northern blot containing 2 (jg of 
poly (A) + RNA from indicated tissues hybridized with human al 
chain collagen cDNA-specif ic probe; and human p-actin- 
specific probe as an internal control. FIG. 7(B) is a 
25 Northern blot containing 2 fjg of poly (A) + RNA from indicated 
cardiovascular tissues hybridized with human al chain 
collagen cDNA-specif ic probe; and human glyceraldehyde 3- 
phosphate dehydrogenase (GAPDH) probe as an internal control. 
FIG. 8 is a quantitative RT-PCR of the expression of 
30 human al chain collagen from human fetal and adult tissues. 
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Human glyceraldehyde 3 -phosphate dehydrogenase was used as 
internal control. 

FIG. 9 is an in situ hybridization analyses of 
expression of the human al chain collagen mRNA expression. 
Cardiovascular sections and cells were hybridized with 
digoxigenin labeled antisense riboprobes for human al chain 
collagen. (A) Longitudinal section, artery; (B) longitudinal 
section, ventricle; and (C) aortic smooth muscle cells. 
Control hybridizations labeled with sense probes did not 
produce signals (data not shown) . Bar, 10 f^m. 

FIG. 10 is a diagram showing the expression of the 
human al chain collagen protein in E . coli. FIG. 10(A) 
shows SDS-PAGE analysis, wherein the numbers indicated are 
molecular weight standards; lane 1 is the non- induced cell 
lysate; lane 2 is the cell lysate induced by IPTG for 2 
hours; and lane 3 is the cell lysate induced for 3 hours. 
In FIG. 10(B), lane 1 shows the human al chain collagen 
protein purified by Ni-column and stained with Coomassie 
brilliant blue; and lanes 2 and 3 are western blot detected 
by anti-histidine antibody, wherein lane 2 is the non- 
induced cell lysate and lane 3 is the cell lysate induced by 
IPTG for 2 hours without purification. 

FIG. 11 is a diagram showing RT-PCR of the recombinant 
expression of human al chain collagen in COS7 cells, 
wherein "-" refers to negative control; "+" is the RT-PCR 
products from transf ormants containing human al chain 
collagen gene; and numbers indicated are molecular weight 
standards . 
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DETAILED DESCRIPTION OF THE INVENTION 

The present invention screened the most conserved 
regions of the known collagen nucleic acid sequences from 
human expressed sequence tag (EST) library. A novel bio- 
5 molecule with the highest homology of the primary amino acid 
sequence was then found by introducing the sequence of that 
region into EST library. A full-length cDNA sequence of the 
novel bio-molecule was then obtained and determined by the 
technologies of bio-informatics and molecular cloning. 
10 At the beginning, a 57 -bp fragment of the conserved 

'"=1 region was aligned in the human EST library to obtain a 

47 

42 fragment with about 3 00 bp in length. The fragment was then 

ffi introduced into Genbank Blast for searching human non- 

p, 5 redundant genes to obtain a fragment with about 14 6 kb in 

Hl5 length containing exons and introns . Possible open reading 

jit frames were analyzed and the relative oligonucleotide probes 

Y~ were thus designed to clone the novel full-length human al 

Q chain collagen. The method of cloning will be further 

ft 

L-p 

p, described in the following examples. 
20 The nucleic acid sequence of the full-length human al 

chain collagen (hCOLAl) gene and the deduced amino acid 
sequence thereof are shown in FIG. 3 (SEQ ID NO. 5 and SEQ 
ID NO. 1, respectively) . The novel human al chain collagen 
gene comprises 2,865 bp, which encodes 954 amino acids with 
25 about 99,000 Da in molecular weight, and is located at the 
pll.2-12.3 on human chromosome XI. 

The above isolated nucleic acid (hCOLAl gene) comprises 
at least the nucleotide sequence set forth in SEQ ID NO. 5 
(including DNA and RNA sequences) or the complementary 
3 0 sequences thereto, and the genomic DNA sequence. Those 
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skilled in this art will be aware that the nucleotide 
sequences can be modified in accordance with any method 
known in the art, and are also within the scope of the 
invention. For example, degenerate codons can be used to 
replace the relative positions but the gene encodes the same 
amino acid sequence. Further, additional codons can be 
inserted into the sequence or added either at the 3'- or 5'- 
end, but the activity of the protein is not affected or 
slightly affected. Accordingly, the complementary sequences 
and degenerate sequences of SEQ ID NO. 5, and various 
modified variants are included in the present invention. 
13 See, for example, Sambrook, et al . , Molecular Cloning, A 
Laboratory Manual, Cold Spring Harbor Laboratory Press, New 
York, 1989. 

H=i5 in accordance with the present invention, the human al 

chain collagen encoded by the isolated hCOLAl gene comprises 
three domains (as shown in FIG. 5) , including (i) von 
Willebrand factor A domain (having amino acid sequence set 
forth in SEQ ID NO. 2) ; (ii) thrombospondin N- terminal -like 
domain (having amino acid sequence set forth in SEQ ID NO. 
3) ; and (iii) collagenous domain (having amino acid sequence 
set forth in SEQ ID NO. 4) . To analyze the primary sequence 
of the protein and to compare with other known 20 collagens, 
the human al chain collagen of the present invention 
25 belongs to FACIT family (fibril associated collagens with 
interrupted triple helices) . From the domains described 
above, it is inferred that the physiological functions of 
the human al chain collagen of the invention may be 
involved in platelet aggregation, cell adhesion, and the 
30 activation of transformation growth factor. In addition, 
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the N- terminal of the collagen protein further includes a 
signal peptide with 22 amino acids in length. It is 
inferred that the human al chain collagen of the invention 
is located in the extracellular matrix. 

The present invention also provides a recombinant 
vector comprising the nucleic acid cloned and isolated above, 
and optionally a regulatory sequence, such as replication 
region, selection marker (e.g. antibiotic resistance marker), 
and promoter, etc. The promoter used herein can be an 
eukaryotic cell promoter or a prokaryotic cell promoter so 
that the recombinant vector can be expressed in a suitable 
host, for example, eukaryotic cells such as mammalian cells 
or yeast, or prokaryotic cells such as Escherichia coli. 

In one preferred embodiment, the present invention 
provides a recombinant vector in which the hCOLAl gene is 
cloned into Bluescript KS (+) vector (Strategene) . The 
recombinant vector is deposited at the Culture Collection 
and Research Center (Hsinchu, Taiwan) on November 14, 2000, 
and assigned accession number CCRC 940331. 

One can produce the novel human al chain collagen 
protein which have the amino acid sequence set forth in SEQ 
ID NO. 1 using the isolated nucleic acid described above by 
any suitable method in any suitable expression system known 
in this art. Therefore, the method for producing human al 
chain collagen protein is also within the scope of the 
present invention. 

One preferred expression system for the recombinant 
production of the collagen of the invention is in transgenic 
non-human animals, wherein the desired collagen may be 
recovered from the milk of the transgenic animal. Such a 
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system is constructed by operably linking the DNA sequence 
encoding the collagen of the invention to a promoter and 
other required or optional regulatory sequences capable of 
effecting expression in mammary gland. Likewise, required 
5 or optional post-translational enzymes may be produced 
simultaneously in the target cells, employing suitable 
expression system operable in the targeted milk protein 
producing mammary gland cells. 

In one preferred embodiment of the present invention, 
10 the nucleic acid of SEQ ID NO. 5 is subcloned into an 
0 expression vector to obtain another recombinant vector. A 
S suitable host cell (for example, eukaryotic or prokaryotic 
m cell) is then transformed or transfected with the 
J? recombinant vector. The transformed or transfected cells 
Pi 5 are then cultured under the conditions sufficient for 
L expression of human al chain collagen protein. Finally, 
the expressed proteins are recovered and purified. Those 
skilled in this art will appreciate that the recovering and 
purifying method is not limited, for example, by various 
chromatographies. Preferably, the human al chain collagen 
is expressed using histidine tag fusion protein technique, 
and the recovering and purifying method is performed by 

affinity column. 

As used herein, the term "transformation" or 
"transfection" includes a variety of techniques for 
introducing an exogenous nucleic acid into a cell (for 
example, eukaryotic or prokaryotic), including calcium 
phosphate or calcium chloride precipitation, microinjection, 
DEAE-dextrin-mediated transf ection, lipofection, or 
30 electroporat ion . 



a 

Mi 



20 



25 



11 



Client's Ref.: BMEC-89-24/ 10-30-2001 
File: 0648-5921 -US/Final/ Frank/Chiumeow 

Electroporation is carried out at approximate voltage 
and capacitance (and corresponding time constant) to result 
in enter of the DNA construct (s) into the host cells. 
Electroporation can be carried out over a wide range of 
5 voltages (e.g. 50 to 2,000 volts) and corresponding 
capacitance. Total DNA of approximately 0.1 to 500 [ig is 

generally used. 

Methods such as calcium phosphate precipitation and 
colubrine precipitation, liposome fusion and receptor- 
10 mediated gene delivery can also be used to transfect cells. 

The genetic engineering methods mentioned above such as 
DNA modification, cloning, construction, and isolation of 
the recombinant vector, protein expression, and purification 
can be accomplished by those skilled in this art, and which 
K5 can be seen in, for example, Ausubel F. M. , et al . , Current 
U Protocols in Molecular Biology, New York, 1992; Sambrook, et 
K al., supra; or Davis, L. G. , Methods in Molecular Biology, 
Elsevier, Amsterdam, NL, 1986. 

In one aspect of the present invention, the isolated 
2 0 nucleic acid further includes the fragments derived from SEQ 
ID NO. 5 or the complementary sequences thereto to be as a 
nucleic acid probe or primer for detection. Those skilled 
in the art will be aware that the length of the nucleic acid 
fragment is not limited. For example, as a nucleic acid 

2 5 probe, the fragment preferably comprises at least 500 

contiguous nucleotides in length derived from SEQ ID NO. 5 
or more, while as a nucleic acid primer, the fragment 
preferably comprises at least 20 contiguous nucleotides in 
length derived from SEQ ID NO. 5 or more. The selection of 

3 0 the length of fragment is dependent upon the conditions of 
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detection method as described below. For example, the 
temperature and ionic strength used in hybridization, or the 
temperature used in polymerase chain reaction (PCR) . 
Generally, the length of the nucleic acid probe is in 
proportion to the specificity of the detection result. 
Accordingly, the nucleic acid probe preferably comprises at 
least 500 contiguous nucleotides in length derived from SEQ 
ID NO. 5, and more preferably comprises the full-length 
nucleic acid of SEQ ID NO. 5. In addition, the length of 
the nucleic acid primer is in proportion to the specificity 
of the detection result. Accordingly, the nucleic acid 
primer preferably comprises at least 2 0 contiguous 
5 nucleotides in length derived from SEQ ID NO. 5, and more 
u, preferably comprises 20-25 contiguous nucleotides, thereby 
*"*l5 increasing the specificity of the detection result. 
f=i The human al chain collagen polynucleotide of the 

H present invention may be used for diagnostic and/ or 
0 therapeutic purposes. For diagnostic uses, the 

H> polynucleotide of the invention may be used to detect the 
2 0 human al chain collagen gene expression or aberrant al 
chain collagen gene expression in disease states, e.g., 
rheumatoid arthritis, osteoarthritis, reactive arthritis, 
autoimmune bearing disease, cartilage inflammation due to 
bacterial or viral infections (e.g. Lyme's disease), 
25 parasitic disease, bursitis, corneal diseases, ankylosing 
spondylitis (fusion of the spine) , and cardiovascular 
disease . 

In the present invention the inventor suggested the 
novel collagen is derived from blood vessels, and maybe 
relates to cardiovascular disease. By analyzing gene 
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mutations causing cardiovascular abnormalities, it can 
provide a basis for development of rational therapies in the 
clinical treatment of cardiovascular disorders. 

The kit of the present invention used for detecting 
such diseases comprises a probe or primer described above. 
Methods for detecting the expression of hCOLAl gene by using 
the nucleic acid probe include, but are not limited to, 
Northern analysis, Southern analysis, in situ hybridization, 
and bio-chip/microarray, etc., which are well known in the 
art. Those skilled in the art will appreciate that methods 
using the complementary properties between two nucleic acid 
molecules are within the scope of the present invention. In 
addition, methods for detecting the expression of hCOLAl 
gene by using the nucleic acid primer include, but are not 
limited to, reverse transcriptase polymerase chain reaction 
(RT-PCR) , 5' -Rapid Amplification of cDNA End (5' -RACE), and 
3' -RACE, etc. Those skilled in the art will appreciate that 
methods using the at least one primer in combination with 
PCR are also within the scope of the present invention. 

The human al chain collagen gene and/or protein of the 
present invention may be useful in the treatment of various 
abnormal conditions. By introducing gene sequences into 
cells, gene therapy can be used to treat conditions in which 
the cells underexpress normal al chain collagen or express 
abnormal/ inactive al chain collagen. In some instance, the 
polynucleotide sequence encoding the human al chain 
collagen of the invention is intended to replace or act in 
the place of a functionally deficient endogenous gene. 
Alternatively, abnormal conditions characterized by 
overproliferation can be treated using the antisense of the 
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human al chain collagen coding sequence of the invention. 
Recombinant gene therapy vectors, such as viral vectors, may 
be engineered to express the human al chain collagen of the 
invention. Thus recombinant gene therapy vectors may be 
5 used therapeutically for treatment of diseases resulting 
from aberrant expression or activity of the human al chain 
collagen of the invention. 

Without intending to limit it in any manner, the 
present invention will be further illustrated by the 
10 following examples. 

U 

5 EXAMPLE 

rfs Example 1. cDNA cloning of hCOLAl 

P a Clontech SMART RACE cDNA Amplification kit was used 

N»i5 to clone hCOLAl cDNA. Sequence specific primers used for 
L t the following RACE reactions were either deduced from the 

ft previously published partial human genomic clone 682J15 
0 (Genbank Accession No. AL034452) or the cloned hCOLAl cDNA 

fragment. Initially, first strand cDNA was synthesized from 
20 1 ng of total RNA pool (Clontech) using Superscript II 
reverse transcriptase with a specific primer 5'- 
GGTTCACCTTTGCTTCCCTTAG-3 ' , deduced from the clone 682J15. 
The reaction was following to the manufacture's protocol. 
The above reverse transcription reaction mixture was used 
25 for 5 'RACE reaction with a sequence specific primer (5'- 
TTGGCCCATTAATCCTCGGTTTC- 3 ' ) , corresponding to nucleotides 
1823-1845 of the hCOLAl cDNA and the universal primer 
provided by the kit. All assays were performed in a 50 -ul 
reaction volume using the GeneAmp PCR system 9600 (Perkin- 
30 Elmer Cetus) . 
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To obtain the entire coding region of hCOLAl gene, 
first strand cDNA was synthesized from 1 ug of total RNA 
pool (Clontech) using Superscript II reverse transcriptase 
with an oligo dT primer. After reverse transcription, 1 ul 
5 of the reaction mixture was used for PCR amplification with 
a upstream primer (5 ' -ATTCCTGGGCCACCTGGTCCGATA- 3 ' ) , 

corresponding to the most 5' candidate initiator methionine 
of the clone 708F5 (Genbank Accession No. AL031782) and a 
downstream primer ( 5 ' -CTAATAGTTTGGTCCTTTTCT-3 ' ) , 

10 corresponding to the 3' stop codon of the clone 682 J15. A 
L i single band with a molecular size of 2.9 kilo bases was 
^3 obtained (Figure 1). The band was excised from gel and 
81 cloned into the BlueScript II KS (+) vector (Strategene) . 
f, The recombinant vector was deposited at the Culture 
Hi 5 Collection and Research Center (Hsinchu, Taiwan) and 
assigned accession number CCRC 940331. After nucleotide 
sequence analysis, the PCR product was found to contain the 
entire open reading frame of hCOLAl . 



2 0 Example 2. Nucleotide sequencin g 

Nucleotide sequencing was carried out with the Sanger 
dideoxynucleotide chain termination method (Sambrook, et al . , 
1989) . The sequence samples were prepared using the Ampli 
Taq cycle sequencing kit (Perkin-Elmer , Inc.) following the 

25 manufacturer's protocol. The samples were applied to a 377 
automatic sequencer (Perkin-Elmer, Inc.). All reported 
sequences were confirmed by sequencing of both sense and 
antisense strands. The full-length nucleotide sequence (SEQ 
ID NO. 5) and the deduced amino acid sequence (SEQ ID NO. 1) 
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of the human al chain collagen of the invention is shown in 
FIG. 3. 
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Ry^m pl p 3. Nnrthern blot analy sis 

The human multiple tissue and the cardiovascular 
Northern blots, containing 2 fig of poly (A) + RNA from 
indicated tissues, were obtained from Clontech (catalog 
number 7780-1 and 7791-1, respectively) . The blot was 
hybridized with a randomly primed 32 P-labeled probe 
corresponding to nucleotides 1236-1863 of the hCOLAl open 
reading frame at 60 °C in ExpressHyb solution for one hour 
and washed with 2x SSC/0.1% SDS two times for 15 min each at 
60 °C. Then the blot was washed with 0.2x SSC/0.1% SDS three 
times for 15 min each at 60°C. Human p actin or GAPDH probe 
Kl5 was used as a control for the amount of RNA in each lane. 
L t As shown in Fig. 7A, a transcript of approximately 4.3 kb is 
observed, in agreement with the size of the cloned cDNA. 
The expression of hCOLAl collagen is mostly confined to 
placenta and heart tissues, with lower levels in skeletal 
muscle, small intestine, liver and lung. Another transcript 
of approximately 2.4 kb was detected to be hybridized with 
the probe in heart tissue. It probably is a splicing 
variant of the hCOLAl gene. We further examined the 
expression pattern of hCOLAl in human cardiovascular tissues 
25 containing fetal heart and adult heart tissues, together 
with the aortic and cardiac tissues by Northern blot 
analysis. Surprisingly, the hCOLAl transcripts were only 
present in fetal heart and aortic tissues (Fig. 7B) . 
Moreover, the 2.4 kb short transcript was only present in 
the fetal heart. Another 7.3 kb band was detected in both 
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tissues. We do not know if this is an additional splicing 
variant of the hCOLAl gene. No hybridization signal was 
detected in adult heart and cardiac tissues. Although the 
result showing the absence of hCOLAl transcript in adult 
5 heart is inconsistent with the data of Northern blot 
analysis in Fig. 7A, the hCOLAl mRNA level in fetal heart is 
22 -fold in excess of the adult heart based on the 
quantitative RT-PCR results in Fig. 8 (see below) . The 
presence of the hCOLAl transcripts in aorta suggests that 
10 this novel collagen is derived from blood vessels. 
□ 

42 Exam ple 4. Quant i tat ive RT-PCR 

ft Five micrograms of total RNAs from a variety of human 

P fetal and adult tissues obtained from Clontech (catalog 
Hi 5 number K4005-1) were used for reverse transcription 
U reactions with oligo (dT) primers. After reverse 

h transcription reactions, the relative quantity of endogenous 
P GAPDH mRNA in each tissue sample was determined with CYBR 
\1 Green fluorescence dye (Molecular Probes) using Real-time 
20 PCR analysis (LightCycler , Roche Molecular Biochemicals) . 
The resulting GAPDH mRNA value in each tissue sample was 
used to normalize the sample for differences in the amount 
of total RNA added to each PCR reaction. Each of the 
normalized tissue samples was then split to perform the 
25 target ocl(XXI) collagen and control GAPDH amplifications by 
Real-time PCR analysis. The relative quantity of ocl(XXI) 
collagen cDNA in each reaction was determined in the 
exponential phase to ensure that the amount of product 
amplified reflects the quantity of starting mRNA. Primers 
30 used for PCR amplifications are as follows: GAPDH (5'- 
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TGAAGGTCGGAGTCAACGGATTTGGT - 3 ' and 5 ' - 

CATGTGGGCCATGAGGTCCACCAC - 3 ' ; 98 3 -bp fragment); hCOLAl 
collagen ( 5 ' -TTCCTGGAAACCGAGGATTAATG- 3 ' and 5 ' - 

AGTCCACGATCACCCTTGTCAC - 3 ' ; 1546-bp fragment). Meanwhile, 
samples at a PCR cycle in the linear range of amplification 
(30 cycles for hCOLAl; 20 cycles for GAPDH) were 
electrophoresed on 1.5% agarose and stained with ethidium 
bromide for visualization. As shown in Fig. 8, when 
normalized to the GAPDH values, the relative amounts of 
hCOLAl transcripts were 2.7, 22 and 30 times more in fetal 
brain, heart and liver than in the adult counterparts, 
respectively. The results indicate that hCOLAl expression 
is developmental ly regulated and suggest a role for al (XXI) 
collagen in developmental processes in multiple tissues. 
Comparison of the hCOLAl expression in different adult 
tissues reveals that high levels of hCOLAl expression were 
detected in trachea, testis, uterus, and placenta, with 
modest levels of expression in brain, lung, colon, prostate, 
spinal cord, and salivary gland. The hCOLAl collagen mRNA 
expression was very low or undetectable in adult heart, 
liver, kidney, bone marrow, spleen, thymus, skeletal muscle, 
and adrenal gland. 

Exam ple 5. In situ hyhridizat.i on a n a lysis 

In situ hybridization was performed on 5-/xm human 
cardiovascular tissue sections (Novagen, catalog number 
70316-3) . An antisense or sense RNA probe labeled with 
digoxigenin-UTP (DIG-UTP) encompassing the region 
corresponding to nucleotides 1236-1547 of the hCOLAl open 
reading frame was obtained by in vitro transcription 
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(Boehringer RNA labeling kit). Sections were dewaxed by 
washing three times for 5 min in xylene. After dewaxing, 
sections were rehydrated to PBS through an ethanol series, 
washed three times in PBS, and then incubated for 15 min in 
a proteinase K solution (10 pg/ml in PBS). Proteinase K 
activity was stopped by washing twice in PBS and sections 
were refixed at RT for 30 min in 4% paraformaldehyde, 0.2% 
glutaraldehyde in PBS. After fixation, sections were washed 
twice in PBS then incubated for 1 h at 50°C in pre- 
hybridization mix (50% foramide, 5X SSC, 50 jig/ml yeast tRNA, 
0.1% SDS and 50 pg/ml heparin). Hybridization mix 

containing probe was replaced and incubated at 50°C for 
overnight. After hybridization, sections were washed twice 
for 30 min at 50°C in solution I (50% foramide, 5 X SSC, and 
0.1%SDS) and twice for 30 min at 50°C in solution II (50% 
foramide, 2 X SSC, and 0.1% SDS). Sections were washed 
three times at RT in MAB (100 mM maleic acid, 150 mM NaCl , 
pH 7.5) and then blocked for 2 h at RT with 2% blocking 
reagent (Boehringer) in MAB. Sections were incubated for 2 
h at RT with 2% blocking reagent in MAB containing 1: 2000 
dilution of ant i -DIG antibody. The sections were washed 4 
times for 15 min at RT in MAB-Tween (0.1% Tween-20) , washed 
three times for 5 min in AP buffer (0.1 M Tris-HCl, pH 9.0, 
50 mM MgCl 2 , 0.1 M NaCl, and 0.1% Tween) . Color was 
developed by incubating the sections in NBT/BCIP. After 
developing, sections were washed in PBS and counterstained 
with nuclear fast red and then mounted with Histomount 
Mounting Solution (Zymed) . Cells grown on coverslips for in 
situ hybridization analysis were performed according to the 
previously published protocol. 
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F ^ T r i Q 6 Exprs a aion of hgQLAi in Escherichia ml i and 
pn-ri f i ration 

The entire coding region of the hCOLAl cDNA was 
amplified by PCR with primers 5' -ATGGCTCACTATATTACATTTCTC-3 ' 
corresponding to the- 5' cDNA region and 5'- 
TTAGTGATGGTGATGGTGATGCTCATAGTTTGGTCCTTTTCTG-3 ' , 
corresponding to the 3' region including 6 histidine 
residues right before the stop codon. The amplified DNA 
construct was gel purified and sub-cloned into the 
expression vector pET 15b (Novagen) in which the Nco I site 
was digested and blunted with Klenow fragment. The 
recombinant protein was obtained by expressing the 
constructs in E . coli strain BL21 (DE3) . The transformed E . 
15 coli was cultured in LB medium containing 100 ng/ml of 
ampicillin at 37 °C to reach an optical density of 0.7 at 
600 nm, followed by induction with IPTG at a final 
concentration of 1 mM and kept culturing for an additional 2 
i* or 3 hours. The cell lysate with total proteins was 
20 analyzed by SDS-PAGE. The result is shown in FIG. 10(A). 

One liter of the IPTG induced E. coli cells was 
cultured for 2 hours and then centrifuged at 5000 xg for 30 
min. The cell pellet was washed with PBS and centrifuged 
again. The cell pellet was then re-suspended in 20 ml of 
25 PBS containing 1 mM of PMSF. The cell suspension was 
subjected to ultrasonicat ion to break the cell walls. The 
cell lysate was then centrifuged at 30,000 xg for 40 min. 
The supernatant was applied to a Ni-agarose column (5 ml in 
bed volume) that has been equilibrated with 50 mM of Tris- 
HC1 buffer, pH 8.0 at a flow rate 0.5 ml/min. The column 
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was washed with the same buffer containing 40 mM of 
imidazole. The recombinant hCOLAl was eluted with the same 
buffer containing 0.25 M imidazole. The eluate was 

quantified and analyzed by SDS-PAGE, followed by staining 
with Coomassie brilliant blue. A protein band with 98 kDa 
in molecular weight was observed on the gel (FIG. 10(B), 
lane 1) . In addition, the proteins without purification 
were blotted to a PVDF membrane. An antibody to histidine 
tag (Clontech) was used to detect the recombinant protein. 
The result of Western blot is shown in FIG. 10(B), lanes 2 
and 3, in which the band indicated at 98 kDa corresponds to 
the human al chain collagen protein of the invention. 



Exam ple 7. RxprP^ion o f hCOLAl in enkarvotlc cell 
^ 5 The hCOLAl cDNA containing entire open reading frame 

prepared by Example 4 was gel purified and sub-cloned into 
the expression vector pcDNA 3.1 containing CMV promoter 
3 (Invitrogen) in which the Pme I site was digested and 
blunted with Klenow fragment. The mammalian cells C0S7 were 
transfected with the expression vector via Superfect 
(Qiagen) , and cultured in DMEM supplemented with 10% FBS 
(Life Technologies) for 48 hours. About 10 6 cells were used 
for the extraction of total RNA. The reverse transcription 
was performed with oligo dT primer using 0.2 ng RNA as 
25 template. After reaction, PCR was carried out with primers 
T7 and BGHrev on the pcDNA3 . 1 vector using 0.5 |xl solution. 
The result is shown in FIG. 11, indicating that the vector 
is expressed in the transfected mammalian cells. 
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Referring to FIG. 4, the first 22 amino acid residues 
indicated by a solid bar encode a putative signal peptide 
characterized by secreted proteins. It is inferred that the 
human ctl chain collagen protein of the invention is located 
in the extracellular matrix. 

The amino acid sequence of the human ccl chain collagen 
protein of the invention is compared with those of other 2 0 
known collagens, particularly type IX and type XIX, the most 
similar in structures. The amino acid sequence identity of 
collagens between type IX and hCOLAl of the invention is 24%, 
while that between type XIX and hCOLAl of the invention is 
27%, indicating that hCOLAl of the invention is a novel form 
of collagen (FIG. 6) . 

While the invention has been particularly shown and 
described with the reference to the preferred embodiment 
thereof, it will be understood by those skilled in the art 
that various changes in form and details may be made without 
departing from the spirit and scope of the invention. 



